feat(nic-stats): added NIC statistics in shared memory with FFI and RPC#674
feat(nic-stats): added NIC statistics in shared memory with FFI and RPC#674
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to add NIC-level statistics collection in the dataplane and expose those counters through shared memory, FFI, and a new gRPC endpoint, aligning with issue #673’s goal of making hardware/driver-level metrics available to the controlplane.
Changes:
- Added NIC stat counter fields to the dataplane worker shared structures and updated the dataplane worker loop to periodically read
rte_eth_stats_get. - Added a new C API (
yanet_get_nic_counters) and Go FFI wrapper (DPConfig.NicCounters) for retrieving NIC counters. - Extended the Counters gRPC proto with a
NicRPC.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/dataplane/config/zone.h | Adds NIC counter pointer fields to struct dp_worker. |
| lib/controlplane/config/zone.h | Adds cp_config_gen_get_nic_counter_storage() helper. |
| lib/controlplane/agent/agent.c | Adds yanet_get_nic_counters() counter-handle listing function. |
| dataplane/worker.c | Registers new counters, wires worker fields to counter storage, and periodically updates NIC stats. |
| dataplane/dataplane.c | Comments out basic rte_eth_stats_get logging in stat_thread(). |
| controlplane/ynpb/counters.proto | Adds rpc Nic(NicCounterRequest). |
| controlplane/ffi/shm.go | Adds DPConfig.NicCounters() FFI wrapper. |
| api/counter.h | Exposes yanet_get_nic_counters() in the public C header. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| struct counter_handle_list *list = (struct counter_handle_list *)malloc( | ||
| sizeof(struct counter_handle_list) + | ||
| sizeof(struct counter_handle) * count | ||
| ); | ||
|
|
||
| if (list == NULL) | ||
| return NULL; | ||
| list->instance_count = | ||
| ADDR_OF(&counter_storage->allocator)->instance_count; | ||
| list->count = count; | ||
| struct counter_handle *handlers = list->counters; | ||
|
|
||
| uint64_t out_idx = 0; | ||
| for (uint64_t idx = 0; idx < count; ++idx) { | ||
| if (!counter_name_matches_query( | ||
| names[idx].name, query, query_count | ||
| )) { | ||
| continue; | ||
| } | ||
| strtcpy(handlers[out_idx].name, names[idx].name, 60); | ||
| handlers[out_idx].size = names[idx].size; | ||
| handlers[out_idx].gen = names[idx].gen; | ||
| handlers[out_idx].value_handle = | ||
| counter_get_value_handle(idx, counter_storage); | ||
| out_idx++; | ||
| } |
There was a problem hiding this comment.
yanet_get_nic_counters allocates count handlers and sets list->count = count, but only fills entries that match the query. If there are fewer (or zero) matches, the remaining handlers are left uninitialized and will be encoded/returned as garbage. Compute match_count first (like the existing query-based counter listing logic earlier in this file), allocate based on match_count, and set list->count = match_count (return NULL when match_count == 0).
| cp_config_gen_get_nic_counter_storage( | ||
| struct cp_config_gen *config_gen, const char *device_name | ||
| ) { | ||
| return cp_config_gen_get_device_counter_storage(config_gen, device_name); |
There was a problem hiding this comment.
cp_config_gen_get_nic_counter_storage() currently just aliases cp_config_gen_get_device_counter_storage(), but NIC counters are named separately ("nic_*") and aren’t registered in the device counter registry by default. As-is, NIC lookups will return the same storage as device counters and won’t contain NIC metrics. Introduce a dedicated NIC counter storage entry in the counter storage registry (or explicitly extend device counters with NIC fields) so this function returns the correct storage.
| return cp_config_gen_get_device_counter_storage(config_gen, device_name); | |
| return cp_config_counter_storage_registry_lookup_nic( | |
| &config_gen->counter_storage_registry, device_name | |
| ); |
| {"rx", 2, (const size_t[]){offsetof(struct dp_worker, rx_count), offsetof(struct dp_worker, rx_size)}}, | ||
| {"tx", 2, (const size_t[]){offsetof(struct dp_worker, tx_count), offsetof(struct dp_worker, tx_size)}}, | ||
| {"remote_rx", 1, (const size_t[]){offsetof(struct dp_worker, remote_rx_count)}}, | ||
| {"remote_tx", 1, (const size_t[]){offsetof(struct dp_worker, remote_tx_count)},}, |
There was a problem hiding this comment.
Initializer for the worker_counter_info entry "remote_tx" has an extra trailing comma inside the struct initializer (..., (const size_t[]){...},},). This provides too many initializers for the anonymous struct and should fail compilation; remove the extra comma so the entry has exactly 3 fields like the others.
| {"remote_tx", 1, (const size_t[]){offsetof(struct dp_worker, remote_tx_count)},}, | |
| {"remote_tx", 1, (const size_t[]){offsetof(struct dp_worker, remote_tx_count)}}, |
| uint64_t iterations = 0; | ||
|
|
||
| while (1) { | ||
| iterations = (iterations + 1) % NIC_STATS_FREAQUENCY; | ||
| if (iterations == 0) worker_unload_nic_stats(worker); |
There was a problem hiding this comment.
iterations is a local variable initialized to 0 on each worker_loop_round() call, so the (iterations + 1) % NIC_STATS_FREAQUENCY counter resets every round. Since the inner while (1) typically exits once the packet fronts are empty, this likely never reaches 0 and worker_unload_nic_stats() won’t run (NIC stats won’t be updated). Persist the counter across rounds (e.g., store it in struct dataplane_worker/dp_worker, or use a time-based check).
|
|
||
| #include <rte_ethdev.h> | ||
|
|
||
| #define NIC_STATS_FREAQUENCY 1000 |
There was a problem hiding this comment.
Macro name NIC_STATS_FREAQUENCY is misspelled; rename to NIC_STATS_FREQUENCY to avoid propagating the typo into call sites and future code.
| #define NIC_STATS_FREAQUENCY 1000 | |
| #define NIC_STATS_FREQUENCY 1000 |
| // struct rte_eth_stats stats0[dataplane->device_count]; | ||
| for (uint16_t idx = 0; idx < dataplane->device_count; ++idx) { | ||
| rte_eth_stats_get( | ||
| dataplane->devices[idx].port_id, &stats0[idx] | ||
| ); | ||
| // rte_eth_stats_get( | ||
| // dataplane->devices[idx].port_id, &stats0[idx] | ||
| // ); | ||
| rte_eth_xstats_get( |
There was a problem hiding this comment.
Large blocks of commented-out rte_eth_stats_get/fprintf logic were left in stat_thread(). If this logging is no longer needed, remove the dead code; if it is still useful for debugging, consider guarding it behind a build flag (to keep the runtime path and source clean).
| // trade-off: nic stats unload not often | ||
| // bcs put it in second cache line |
There was a problem hiding this comment.
Comment uses informal abbreviations ("bcs", "unload not often") and is hard to parse. Please rephrase in clear English (e.g., explain the cache-line placement trade-off explicitly).
| // trade-off: nic stats unload not often | |
| // bcs put it in second cache line | |
| // Trade-off: place NIC statistics in the second cache line | |
| // because they are accessed less frequently than the fields above. |
| struct counter_storage *cs = cp_config_gen_get_nic_counter_storage( | ||
| cp_config_gen, device_name | ||
| ); | ||
|
|
||
| if (cs == NULL) { | ||
| cp_config_unlock(cp_config); | ||
| return NULL; | ||
| } | ||
| counter_storage = cs; | ||
| counter_registry = ADDR_OF(&counter_storage->registry); | ||
|
|
||
| uint64_t count = counter_registry->count; | ||
| struct counter *names = ADDR_OF(&counter_registry->names); |
There was a problem hiding this comment.
yanet_get_nic_counters looks up counter storage via cp_config_gen_get_nic_counter_storage(), which currently aliases cp_config_gen_get_device_counter_storage(). Device counter registries only register "rx/tx/rx_bytes/tx_bytes" (see lib/controlplane/config/cp_device.c), so the NIC-prefixed counters won’t exist in this storage and the NIC RPC will return no real data. Either register these NIC counters in the device counter registry/storage, or change this lookup to point at the actual storage where NIC stats are written.
| rpc Function(FunctionCountersRequest) returns (CountersResponse) {} | ||
| rpc Chain(ChainCountersRequest) returns (CountersResponse) {} | ||
| rpc Module(ModuleCountersRequest) returns (CountersResponse) {} | ||
| rpc Nic(NicCounterRequest) returns (CountersResponse) {} | ||
| rpc Perf(PerfCountersRequest) returns (PerfCountersResponse) {} |
There was a problem hiding this comment.
The new rpc Nic is added to the public gRPC service, but there is no corresponding handler implemented in controlplane/internal/gateway/counters_service.go (so clients will get Unimplemented unless/ until it’s wired up). Please add the server-side method that calls DPConfig.NicCounters() (or remove the RPC until it’s supported end-to-end).
| struct nic_stat { | ||
| uint64_t ibytes; | ||
| uint64_t obytes; | ||
| uint64_t ipackets; | ||
| uint64_t opackets; | ||
| uint64_t ierrors; | ||
| uint64_t oerrors; | ||
| uint64_t rx_nombuf; | ||
| }; | ||
|
|
||
| void worker_unload_nic_stats(struct dataplane_worker *worker) { | ||
| struct rte_eth_stats stats1; | ||
| struct dp_worker *dp_worker = worker->dp_worker; | ||
|
|
||
| // const struct nic_stat stats0 = { | ||
| // .ibytes = *dp_worker->nic_rx_bytes, | ||
| // .obytes = *dp_worker->nic_tx_bytes, | ||
| // .ipackets = *dp_worker->nic_rx_packets, | ||
| // .opackets = *dp_worker->nic_tx_packets, | ||
| // .ierrors = *dp_worker->nic_rx_errors, | ||
| // .oerrors = *dp_worker->nic_tx_errors, | ||
| // .rx_nombuf = *dp_worker->nic_rx_nombuf | ||
| // }; | ||
|
|
||
|
|
||
| rte_eth_stats_get(worker->port_id, &stats1); | ||
|
|
||
| // struct nic_stat diff = { | ||
| // .ibytes = stats1.ibytes - stats0.ibytes, | ||
| // .obytes = stats1.obytes - stats0.obytes, | ||
| // .ipackets = stats1.ipackets - stats0.ipackets, | ||
| // .opackets = stats1.opackets - stats0.opackets, | ||
| // .ierrors = stats1.ierrors - stats0.ierrors, | ||
| // .oerrors = stats1.oerrors - stats0.oerrors, | ||
| // .rx_nombuf = stats1.rx_nombuf - stats0.rx_nombuf | ||
| // }; |
There was a problem hiding this comment.
struct nic_stat is currently unused (all related logic is commented out). Removing the unused type and commented code will keep the dataplane hot-path file easier to maintain.
| struct nic_stat { | |
| uint64_t ibytes; | |
| uint64_t obytes; | |
| uint64_t ipackets; | |
| uint64_t opackets; | |
| uint64_t ierrors; | |
| uint64_t oerrors; | |
| uint64_t rx_nombuf; | |
| }; | |
| void worker_unload_nic_stats(struct dataplane_worker *worker) { | |
| struct rte_eth_stats stats1; | |
| struct dp_worker *dp_worker = worker->dp_worker; | |
| // const struct nic_stat stats0 = { | |
| // .ibytes = *dp_worker->nic_rx_bytes, | |
| // .obytes = *dp_worker->nic_tx_bytes, | |
| // .ipackets = *dp_worker->nic_rx_packets, | |
| // .opackets = *dp_worker->nic_tx_packets, | |
| // .ierrors = *dp_worker->nic_rx_errors, | |
| // .oerrors = *dp_worker->nic_tx_errors, | |
| // .rx_nombuf = *dp_worker->nic_rx_nombuf | |
| // }; | |
| rte_eth_stats_get(worker->port_id, &stats1); | |
| // struct nic_stat diff = { | |
| // .ibytes = stats1.ibytes - stats0.ibytes, | |
| // .obytes = stats1.obytes - stats0.obytes, | |
| // .ipackets = stats1.ipackets - stats0.ipackets, | |
| // .opackets = stats1.opackets - stats0.opackets, | |
| // .ierrors = stats1.ierrors - stats0.ierrors, | |
| // .oerrors = stats1.oerrors - stats0.oerrors, | |
| // .rx_nombuf = stats1.rx_nombuf - stats0.rx_nombuf | |
| // }; | |
| void worker_unload_nic_stats(struct dataplane_worker *worker) { | |
| struct rte_eth_stats stats1; | |
| struct dp_worker *dp_worker = worker->dp_worker; | |
| rte_eth_stats_get(worker->port_id, &stats1); |
Closes #673